Ensemble Methods for Automatic Thesaurus Extraction
نویسنده
چکیده
Ensemble methods are state of the art for many NLP tasks. Recent work by Banko and Brill (2001) suggests that this would not necessarily be true if very large training corpora were available. However, their results are limited by the simplicity of their evaluation task and individual classifiers. Our work explores ensemble efficacy for the more complex task of automatic thesaurus extraction on up to 300 million words. We examine our conflicting results in terms of the constraints on, and complexity of, different contextual representations, which contribute to the sparsenessand noise-induced bias behaviour of NLP systems on very large corpora.
منابع مشابه
Improvements In Automatic Thesaurus Extraction
The use of semantic resources is common in modern NLP systems, but methods to extract lexical semantics have only recently begun to perform well enough for practical use. We evaluate existing and new similarity metrics for thesaurus extraction, and experiment with the tradeoff between extraction performance and efficiency. We propose an approximation algorithm, based on canonical attributes and...
متن کاملA Fault Diagnosis Method for Automaton based on Morphological Component Analysis and Ensemble Empirical Mode Decomposition
In the fault diagnosis of automaton, the vibration signal presents non-stationary and non-periodic, which make it difficult to extract the fault features. To solve this problem, an automaton fault diagnosis method based on morphological component analysis (MCA) and ensemble empirical mode decomposition (EEMD) was proposed. Based on the advantages of the morphological component analysis method i...
متن کاملA Fault Diagnosis Method for Automaton Based on Morphological Component Analysis and Ensemble Empirical Mode Decomposition
In the fault diagnosis of automaton, the vibration signal presents non-stationary and non-periodic, which make it difficult to extract the fault features. To solve this problem, an automaton fault diagnosis method based on morphological component analysis (MCA) and ensemble empirical mode decomposition (EEMD) was proposed. Based on the advantages of the morphological component analysis method i...
متن کاملCorpus Based Extraction of Hypernyms
We present two methods of automatic hypernym extraction implemented within a dictionary editing and writing system for terminological thesaurus for land surveying domain: a specialised corpus based method and a term similarity approach. Challenges of both techniques are briefly discussed.
متن کاملOntologising Relational Triples into a Portuguese Thesaurus
Having in mind the automatic acquisition and integration of knowledge from different heterogeneous resources, this paper proposes several automatic methods for attaching term-based relational triples to the synsets of a thesaurus, without exploiting the extraction context for disambiguation. After using the proposed methods to attach triples, extracted from a Portuguese dictionary, to the synse...
متن کامل